A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation
نویسنده
چکیده
The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that having been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.
منابع مشابه
Improving Biomedical Text Categorisation with NLP
Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, n...
متن کاملReading and Assessing the City / Neighborhood FabricAs a Text. Case Study: Sar-Tapulah Historical Neighbourhood inSanandaj
From a linguistic point of view, the city can be seen as a text, consisting of different components and structures being related to each other beyond a sentence. Looking at the city from this point of view, what establishes a syntactic relationship and cohesion and coherence of the components of the city as a common language is called the syntax of the city. Linguistic study of the text of the ...
متن کاملTranslation by Text Categorisation: Medical Image Retrieval in ImageCLEFmed 2006
We present the fusion of simple retrieval strategies with thesaural resources to perform document and query translation by text categorisation for cross–language retrieval in a collection of medical images with case notes. The collection includes documents in French, English and German. The fusion of visual and textual content is also treated. Unlike most automatic categorisation systems our ap...
متن کاملMapping Semantic Knowledge for Unsupervised Text Categorisation
Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these rel...
متن کاملSearchSleuth: The Conceptual Neighbourhood of an Web Query
This paper presents SearchSleuth, a program developed to experiment with a form of automatic local analysis that extends the standard Web search interface to include a conceptual neighbourhood focused on a formal concept derived from the query. The conceptual neighbourhood is displayed with upper neighbours representative of a generalisation operation, and lower neighbours representative of a s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008